FILTER MODE ACTIVE

#LLM serving

Records found: 2

#LLM serving26/10/2025

kvcached Unlocks Elastic KV Caching to Slash GPU Memory Waste for LLMs

kvcached provides a virtualized, elastic KV cache for LLM serving on shared GPUs, reducing memory waste and speeding activation across colocated models.

READ →

#LLM serving22/08/2025

CloudMatrix Supernode: Huawei's Peer-to-Peer Datacenter for Trillion-Scale LLMs

'Huawei's CloudMatrix builds a peer-to-peer supernode combining 384 Ascend 910C NPUs and 192 Kunpeng CPUs to deliver high-throughput, low-latency LLM serving, with CloudMatrix-Infer optimizing MoE and KV cache workloads.'

READ →